fix: use iconv_strlen in kanji mode by vlakoff · Pull Request #200 · Bacon/BaconQrCode

vlakoff · 2025-11-11T05:20:04Z

This fixes the unreadable QR code issue in Kanji mode when using Shift-JIS encoding (see #172).

PR #173 attempted to resolve this by forcing Byte mode, which worked but sacrificed the efficiency of Kanji mode.

In a comment on #173, I discovered the root cause: strlen() was incorrectly used to count characters, leading to misalignment in encoding.

Replacing it with iconv_strlen($content, 'utf-8') ensures accurate character counting for multibyte strings, preserving proper Kanji mode behavior and producing valid QR codes.

vlakoff · 2025-11-11T05:57:23Z

Just a couple of thoughts:

I had a look at chooseMode. It currently iterates over the string on a byte basis. While this works in practice, a proper (although presumably slower) multibyte-aware iteration might be considered—though it’s not straightforward to implement using iconv alone (i.e., without relying on mbstring).
Edit: A multibyte-aware loop would bring no benefit in this case, because the loop exits as soon as it encounters a byte that is neither numeric nor alphanumeric—both of which operate on single-byte characters.
In the Encoder::encode() at hand, strlen() is sufficient when the mode is Mode::ALPHANUMERIC() or Mode::NUMERIC(). So iconv_strlen() is only strictly necessary for Mode::KANJI(), at the moment.
Therefore, there might be a small optimization opportunity here: keeping strlen() for the Mode::ALPHANUMERIC() and Mode::NUMERIC() modes could avoid the overhead of iconv_strlen() when it's not needed.
Edit: I've added a commit for this. strlen() is O(1), which is great.

DASPRiD · 2025-11-15T12:23:13Z

Please rebase against main to get the tests working.

Fixes unreadable QR codes in Kanji mode with Shift-JIS encoding (see Bacon#172). PR Bacon#173 worked around the issue by forcing Byte mode, but that lost Kanji mode’s efficiency. The actual cause was using strlen() to count characters. Replacing it with iconv_strlen($content, 'utf-8') ensures correct character count and restores proper Kanji encoding.

vlakoff · 2025-11-16T09:11:15Z

Rebased. Please let me know what you would like to see improved (including commit messages) before considering this for merging.

DASPRiD · 2025-11-16T15:01:36Z

Looks like there's something wrong with your match syntax?

codecov · 2025-11-16T15:51:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.81%. Comparing base (bd2370d) to head (394a4fb).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #200      +/-   ##
============================================
+ Coverage     70.58%   70.81%   +0.22%     
+ Complexity      995      994       -1     
============================================
  Files            49       49              
  Lines          3182     3169      -13     
============================================
- Hits           2246     2244       -2     
+ Misses          936      925      -11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

DASPRiD · 2025-11-16T15:53:29Z

LGTM, thanks! :)

vlakoff · 2025-11-16T15:56:13Z

Looks like there's something wrong with your match syntax?

It's a shame, multiple expressions cannot be on separate lines…

Furthermore, default cannot be grouped with other expressions, i.e. the following is not possible:

Mode::KANJI(), default => iconv_strlen($content, 'utf-8'),

DASPRiD · 2025-11-16T15:57:26Z

Actually, I just looked through the other modes – are we certain that they all require iconv_strlen?

vlakoff · 2025-11-16T16:09:19Z

I haven’t really dug into the other modes, so I can’t speak with confidence there. My changes were scoped specifically to Kanji mode.

Alternatively, it could look like this:

$numLetters = match ($mode) {
    Mode::BYTE()  => $dataBits->getSizeInBytes(),
    Mode::KANJI() => iconv_strlen($content, 'utf-8'),
    default       => strlen($content),
};

That said, I can’t say for sure whether this covers all cases correctly.

DASPRiD · 2025-11-16T16:26:41Z

Well, as chooseMode in this context will only return those known modes, how about this:

Match on all expected modes coming from chooseMode, and in the default case throw an exception about an unexpected mode.

vlakoff · 2025-11-16T16:46:19Z

I had thought about this approach too (explicitly matching Mode::KANJI() and omitting the default branch).

I had discarded it because any hypothetical future mode addition could suddenly cause breakage, hence the idea of a default/fallback

But overall, that could still be an acceptable approach.

Note that match already throws an UnhandledMatchError on unmatched cases. But I suppose you would rather want to explicitly throw an exception.

DASPRiD · 2025-11-16T17:04:07Z

oh, UnhandledMatchError is totally fine, I didn't know about that. But if somebody modifies chooseMode and adds new modes, they would then be warned by the match if it isn't handled explicitly, which is exactly what we want :)

For NUMERIC and ALPHANUMERIC modes, strlen() is sufficient since these modes only operate on single-byte characters. strlen() runs in O(1) time and avoids the overhead of iconv_strlen().

vlakoff · 2025-11-16T18:15:37Z

Oh, right — encode() calls chooseMode() right at the beginning, which ensures the possible values for $mode.

And since chooseMode() is a private method only invoked there, its sole purpose is to simplify the logic within encode(), so the two are tightly coupled.

I’ve updated the PR accordingly, and also took the opportunity to rewrite appendBytes() using a match for consistency.

Nicer, and more consistent with some other code in the encode() method. Invalid modes are really not expected here, as this private method is called only by encode(), and the mode is determined right at the beginning of encode() by calling chooseMode(). Though, if an invalid mode ever comes here, the match would throw an UnhandledMatchError.

vlakoff force-pushed the shift-jis branch 3 times, most recently from 1e7eea5 to 8bf099a Compare November 12, 2025 10:35

vlakoff changed the title ~~Fix Kanji mode QR code generation~~ fix: Kanji mode QR code generation Nov 14, 2025

DASPRiD mentioned this pull request Nov 15, 2025

Fix unreadable Shift-JIS encoded QR Code #173

Closed

vlakoff added 2 commits November 16, 2025 10:03

Test: Cover exclusive vs. mixed Shift-JIS data for Kanji mode

a7e2d7b

vlakoff force-pushed the shift-jis branch from 8bf099a to f74843a Compare November 16, 2025 09:10

vlakoff force-pushed the shift-jis branch 2 times, most recently from 39a90d6 to 00938aa Compare November 16, 2025 15:51

Use strlen() for numeric and alphanumeric modes to avoid iconv overhead

97111fc

For NUMERIC and ALPHANUMERIC modes, strlen() is sufficient since these modes only operate on single-byte characters. strlen() runs in O(1) time and avoids the overhead of iconv_strlen().

vlakoff force-pushed the shift-jis branch from 00938aa to edabeb8 Compare November 16, 2025 18:14

vlakoff force-pushed the shift-jis branch from edabeb8 to 394a4fb Compare November 16, 2025 18:17

DASPRiD changed the title ~~fix: Kanji mode QR code generation~~ fix: use iconv_strlen in kanji mode Nov 16, 2025

DASPRiD merged commit 38d3c55 into Bacon:main Nov 16, 2025
9 checks passed

github-actions bot mentioned this pull request Nov 16, 2025

chore(main): release 3.0.2 #207

Merged

vlakoff mentioned this pull request Nov 16, 2025

Kanji Mode QR code is broken #172

Closed

vlakoff deleted the shift-jis branch November 17, 2025 22:06

Comments

Conversation

vlakoff commented Nov 11, 2025

Uh oh!

vlakoff commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DASPRiD commented Nov 15, 2025

Uh oh!

vlakoff commented Nov 16, 2025

Uh oh!

DASPRiD commented Nov 16, 2025

Uh oh!

codecov bot commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

DASPRiD commented Nov 16, 2025

Uh oh!

vlakoff commented Nov 16, 2025

Uh oh!

DASPRiD commented Nov 16, 2025

Uh oh!

vlakoff commented Nov 16, 2025

Uh oh!

DASPRiD commented Nov 16, 2025

Uh oh!

vlakoff commented Nov 16, 2025

Uh oh!

DASPRiD commented Nov 16, 2025

Uh oh!

vlakoff commented Nov 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vlakoff commented Nov 11, 2025 •

edited

Loading

codecov bot commented Nov 16, 2025 •

edited

Loading